Three-dimensional stacked memory optimizations for latency and power

ABSTRACT

An aspect includes receiving a request to write data to a memory that includes a stack of memory devices, each of the memory devices communicatively coupled to at least one other of the memory devices in the stack via a through silicon via (TSV). The write request is received by a hypervisor from an application executing on a virtual machine managed by the hypervisor. In response to receiving the request a latency requirement of accesses to the write data is determined. A physical location on a memory device in the stack of memory devices is assigned to the write data based at least in part on the latency requirement and a position of the memory device in the stack of memory devices. A write command that includes the physical location and the write data is sent to a memory controller.

BACKGROUND

Embodiments of the invention relate to computer memory, and morespecifically to three-dimensional (3D) stacked memory optimizations forlatency and power.

High speed server systems with large memory capacities are becomingincreasingly important in order to support ever growing customerdemands. Modern portable devices require high capacity memory with lowlatency and a compact form factor. 3D memory stacking solutions can beutilized to provide higher capacity memory within a smaller footprint.The stacking of multiple memory integrated circuits (ICs), or chips,also provides an improvement in electrical performance due to shorterinterconnects. One technique that is used to stack chips is referred toas through-silicon via (TSV) where vertical copper channels are builtinto each chip so that when they are placed on top of each other, theTSVs connect the chips together. TSVs allow for stacking of volatiledynamic random access memory (DRAM) devices with a processor to buildvery compact devices for portable applications. TSV techniques alsoallow 3D stacking of memory devices to create dense non-volatile memorysuch as flash or solid state drives with high capacity.

SUMMARY

Embodiments of the invention include methods, systems, and computerprogram products for three-dimensional (3D) stacked memory optimizationsfor latency and power. An example method includes receiving a request towrite data to a memory. The memory includes a stack of memory devices,each of the memory devices communicatively coupled to at least one otherof the memory devices in the stack via a through silicon via (TSV). Thewrite request is received by a hypervisor from an application executingon a virtual machine managed by the hypervisor. In response to receivingthe request a latency requirement of accesses to the write data isdetermined. A physical location on a memory device in the stack ofmemory devices is assigned to the write data based at least in part onthe latency requirement and a position of the memory device in the stackof memory devices. A write command that includes the physical locationand the write data is sent to a memory controller.

Additional features and advantages are realized through the techniquesof the present invention. Other embodiments and aspects of the inventionare described in detail herein and are considered a part of the claimedinvention. For a better understanding of the invention with theadvantages and the features, refer to the description and to thedrawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The subject matter which is regarded as the invention is particularlypointed out and distinctly claimed in the claims at the conclusion ofthe specification. The forgoing and other features, and advantages ofthe invention are apparent from the following detailed description takenin conjunction with the accompanying drawings in which:

FIG. 1 depicts a block diagram of a system for memory optimizations forlatency and power in accordance with one or more embodiments of theinvention;

FIG. 2 depicts a block diagram of a system for memory optimizations forlatency and power in a three-dimensional (3D) stacked memory inaccordance with one or more embodiments of the invention;

FIG. 3 depicts a flow diagram of a process for performing memoryoptimizations for latency and power in a 3D stacked memory in accordancewith one or more embodiments of the invention; and

FIG. 4 depicts a block diagram of a system for memory optimizations in atwo-dimensional (2D) memory in accordance with one or more embodimentsof the invention.

DETAILED DESCRIPTION

One or more embodiments of the present invention provide memoryallocation that is performed by matching application requirements withmemory devices at particular locations within a three-dimensional (3D)stack of memory devices that can support the application requirements.Memory space is allocated on different memory devices at differentpositions in the 3D stack based on application requirements that caninclude but are not limited to an estimated need for latency whenaccessing the data, an estimate of how long the data will be resident inthe memory, and an estimated frequency of accesses to the data. By usinginformation about the application when allocating the memory, the 3Dstacked memory can be optimized for latency and power. 3D packagingallows stacking of multiple memory chips (e.g., memory devices) that areconnected using through-silicon vias (TSVs) to provide high memorydensity in a reduced form factor. By placing data in the 3D stack basedon application requirements, one or more embodiments of the inventionprovide the ability to incorporate traditional reliability,availability, and serviceability (RAS) functions in 3D stacked memory.

Stacked memory can present numerous power and thermal challenges. Forexample, in a four chip stacked memory, each memory chip's thermalresistance to package is varied mainly because of its position in thestack. As the number of memory chips in a stack increases (e.g., toeight, sixteen, one-hundred and twenty-eight, etc.), the thermaldistribution becomes increasingly complex and varied. Typically, top andbottom memory chips in the stack connect to the package/pins/circuitboard and therefore have a better thermal profile (e.g., are cooler)when compared to memory chips in the middle of the stack. Ensuring anidentical thermal profile across all memory chips in the stack wouldrequire each memory chip design to be different and would make memorychip fabrication complicated. Due at least in part to the differentthermal profiles of memory chips, memory management can be relativelycomplex when memory chips are stacked together (e.g., when compared tonon-stacked memory chips).

In accordance with one or more embodiments of the invention, the memorydevices in the 3D stack are homogeneous and include the same type ofmemory devices. The memory devices can be implemented by any memorydevices capable of being stacked such as, but not limited to: dynamicrandom access memory (DRAM) devices, flash memory devices, spin-transfertorque magnetoresistive random access memory (STT-MRAM) devices, staticrandom access memory (SRAM) devices, and future slow memory devices. Inaccordance with one or more other embodiments of the invention, thememory devices in the 3D stack are heterogeneous and include two or moredifferent types of memory devices such as, but not limited to: DRAMdevices, flash memory devices, STT-MRAM devices, SRAM devices, andfuture slow memory devices. In accordance with one or more embodimentsof the invention, thermal dissipation is optimized in a heterogeneousmemory stack by having memory devices with the highest power (e.g.,SRAM) on the top and bottom of the stack, by having memory devices withintermediate power (e.g., DRAM) next to the SRAM devices, and by havinglower power memory devices (e.g., flash) in the middle of the stack.

In accordance with one or more embodiments of the invention, dependingon a latency requirement specified by an application, a memoryallocation for the application is serviced using a memory device at aparticular type of memory location in the 3D stack. For example, if amemory request from the application requires a low latency, a relativelylower latency non-volatile memory (NVM) that is located for example nearthe bottom of the DRAM stack can be allocated to the memory request. Inaccordance with one or more embodiments of the invention, a hypervisorkeeps track of which rank(s) or memory region(s) contains slower memoryhaving higher latencies and which rank(s) or memory region(s) containfaster memory having lower latencies. The application or memoryallocation can be performed based on specified application requirementsand a location of the memory chip within the 3D stack. For example, slowmemory chips, which use less power and generate less heat can be locatedin the middle of the 3D stack where they may be thermally challenged(e.g., it is more difficult to dissipate the heat that they generate dueto their location in the stack).

In accordance with one or more embodiments of the invention, a tag bitis received from a processor executing an application that is requestinga memory access. The tag bit indicates characteristics of the write datasuch as, but not limited to a latency requirement, a predicted frequencyof access, and a predicted length of time that the write data will bestored on the memory. The tag bit can also indicate characteristics ofthe write data indirectly by assigning characteristics to theapplication which apply to all write requests from the application. Thetag bit can indicate that the incoming request requires service as ahigh priority request with information returned back to the processorimmediately. For this type of timing critical request that requires alow latency, data associated with the request can be assigned a memorydevice that is at the bottom of the stack. This ensures that the latencyof the 3D stack is not induced (causing additional latency) whenconsidering very highly stacked memory chips, such as DRAMs.

In accordance with one or more embodiments of the invention, frequentlyaccessed and non-critical timing data can be placed at the top of thestack, where thermal challenges can be taken care of by a heat sink atthe top of the stack. The tag bit can also indicate how long the datawill likely be resident in the memory and/or an expected frequency ofaccess to the data. The information provided by the tag can be used todirect where to store the data in a 3D memory stack to optimize firstaccess latency deltas of different layers in the memory stack whiletaking into consideration the different thermal constraints of thedifferent levels in the stack. As used herein, the term “first accesslatency” refers to the read latency of the fastest chip, or memorydevice, in the stack, that is the chip that delivers data on the busfirst compared to other chips in homogeneous or heterogeneous chipconfigurations.

Turning now to FIG. 1, a block diagram of a system 100 for memoryoptimizations for latency and power is generally shown in accordancewith one or more embodiments of the invention. The system 100 shown inFIG. 1 includes a host computing platform 180. The host computingplatform 180 shown in FIG. 1 includes central processing unit (CPU) 110,memory 120, and memory controller 130. In accordance with an embodimentof the invention, operating system 140 executes on the CPU 110 andincludes hypervisor 170 which creates and executes virtual machines 150.As shown in the embodiment of FIG. 1, each of the virtual machines 150hosts the operation of one or more applications 160. In accordance withone or more embodiments, operating system 140 moderates the utilizationof the CPU 110 and the memory 120 for host virtual machines 150.

In accordance with one more embodiments of the invention, the memorycontroller 130 can receive write requests from the hypervisor 170. Thewrite requests from the hypervisor 170 contain write data to be writtento the memory 120 and a physical address of a location in the memory 120to which the data will be written. In an embodiment, the hypervisor 170contains a memory map for each application 160. The hypervisor 170 usesthe memory map along with a tag received from an application 160 todetermine a physical location in the memory 120 for the write data.

In accordance with one or more embodiments of the invention, the memory120 is implemented by one or more memory modules each containing aplurality of memory devices including stacked memory devices. Inaccordance with one or more embodiments of the invention, the memorydevices are stacked on top of each other and connected to each other viaone or more through-silicon via (TSV).

The system 100 shown in FIG. 1 is one example of a configuration thatmay be utilized to perform the processing described herein. Although thesystem 100 has been depicted with only a single host computing platform180, CPU 110, memory 120, and memory controller 130, it will beunderstood that embodiments can operate in systems with two or more ofthe host computing platform 180, CPU 110, memory 120, and/or memorycontroller 130. In an embodiment, the CPU 110, memory 120, and memorycontroller 130 are not located within the host computing platform 180.For example, the memory 120 and memory controller 130 may be located inone physical location (e.g., on a memory module) while the CPU 110 islocated in another physical location (e.g., the CPU 110 accesses thememory controller 130 via a network). In addition, portions of theprocessing described herein may span one or more of the operating system140, CPU 110, memory 120, and memory controller 130.

Turning now to FIG. 2, a block diagram of a system 200 for memoryoptimizations for latency and power in a 3D stacked memory is generallyshown in accordance with one or more embodiments of the invention. Thememory stack 204 shown in FIG. 2 is heterogeneous, that is, it usesdifferent memory types in the different layers of the memory stack 204including DRAMs, phase change memory (PCM), and STT memory. In theexample shown, faster higher power memory (e.g. DRAM) is located in thetop and bottom layers of the memory stack where heat conduction is best.Slower (e.g., slower access speed) and lower power memory (when comparedto DRAM) such as PCM and STT are used internal to the stack where heatremoval is most challenged. Frequently accessed data can be stored inthe DRAMs which are byte addressable at the bottom of the memory stack204. The memory stack 204 shown in FIG. 2 is located on a dual in-linememory module (DIMM) 202 and a logic chip 206 is located on the bottomof the stack. Also shown in FIG. 2 is a hypervisor 212 executing on hostprocessor 210 to provide memory commands to a memory controller 208.

In accordance with one or more embodiments, a tag bit is received froman application and used to inform the hypervisor 212 of the type ofmemory (e.g., high latency, low latency) that should be allocated forthe write data. The write data in the write request can includeapplication data and/or a workload being executed by the application.

The memory stack 204 shown in FIG. 2 is just one example of a memorystack that may be implemented by one or more embodiments of theinvention. Any combination and number of memory devices can beimplemented by one or more embodiments of the invention. An example ofanother memory stack includes memory devices with the highest power(e.g., SRAM) on the top and bottom of the stack, memory devices withless power than SRAMs (e.g., DRAMs) next to the SRAM devices, and memorydevices with less power than DRAMs (e.g., flash) in the middle of thestack. In another example, the memory stack is homogenous and containsall DRAMs or all SRAMs.

Turning now to FIG. 3, a flow diagram 300 of a process for performingmemory optimizations for latency and power in a 3D stacked memory isgenerally shown in accordance with one or more embodiments of theinvention. The processing shown in FIG. 3 can be performed by ahypervisor such as hypervisor 170 of FIG. 1. At block 302, a writerequest is received by the hypervisor (e.g., hypervisor 170 of FIG. 1)from an application (e.g., application 160 of FIG. 1). At block 304 thehypervisor determines a latency requirement associated with data beingwritten by the write request. In accordance with one or more embodimentsthe application communicates the latency requirement to the hypervisorusing a tag that is included in the write request. The tag can identifyother write data characteristics such as, but not limited to a predictedaccess frequency of the write data and a predicted length of time thatthe write data will be stored in the memory. In accordance with one ormore embodiments, a tag is not utilized and the write datacharacteristics are determined by the hypervisor based oncharacteristics of the application making the write request that arestored for example, in a table accessible by the hypervisor.

As used herein, the term “write data” refers to workload data (e.g.,application code) as well as to application data (e.g., data written bythe application code).

At block 306, the hypervisor assigns a physical location on a memorydevice in the stack of memory devices to the write data. The assigningis based at least in part on the determined latency requirement and aposition of the memory device in the stack of memory devices. Forexample, if the latency requirement is a low latency, a memory locationon a memory device in a portion of the stack of memory devices having alow latency will be selected by the hypervisor and assigned to the writedata. In another example, if the write data is expected to be frequentlyaccessed but does not have a low latency requirement, the hypervisor canselect a physical location on a memory device at the top of the stack(e.g., near a heatsink). In a further example, if the write data isexpected to be resident in the memory for a relatively long period oftime and not accessed frequently, the hypervisor can select a physicallocation on a memory device in the middle of the stack. In addition tothe position in the stack, the hypervisor can also consider a type ofthe memory device (e.g., speed, thermal characteristic, power, capacity,volatility, etc.) when assigning the physical location.

In accordance with one or more embodiments of the invention, thehypervisor has access to a memory map for each application which it usedto assign the physical location in the memory stack. In addition, thehypervisor has information about what types of memory devices are in thestack, where they are physically located, and the type of write datacharacteristics they support (e.g., low latency, etc.).

At block, 308, the hypervisor sends a write command to a memorycontroller that includes the physical location in the memory stack andthe write data.

In accordance with other embodiments of the invention, the hypervisorcommunicates the write data characteristics to the memory controller andthe memory controller selects the memory device and the physicallocation on the memory device based on the write data characteristics.

Turning now to FIG. 4, a block diagram of a system 400 for memoryoptimizations in a two-dimensional (2D) memory is generally shown inaccordance with one or more embodiments of the invention. The system 400of FIG. 4 includes a hypervisor 412 executing on a host processor 410 tointerface with a memory controller 408. The memory controller iselectrically coupled to a slow memory 402 and a fast memory 404. Inaccordance with one or more embodiments, a tag bit is received from anapplication and used to inform the hypervisor 412 of the type of memory(e.g., slow or high latency, fast or low latency) that should beallocated for the write data. If the application specifies fast memory,then the hypervisor can instruct (e.g., via a tag bit or physicaladdress) the memory controller 408 to write the write data to the fastmemory 404. Similarly, if the application specifies slow memory, thenthe hypervisor 412 can instruct (e.g., via a tag bit or physicaladdress) the memory controller 408 to write the write data to the slowmemory 402.

Technical effects and benefits of embodiments of the present inventioninclude the ability to optimize latency and power in a stack of memorydevices by placing data in memory locations based on latency and/orpower requirements of the data being stored. For example, data requiringlower latency access can be stored on memory devices closer to thebottom of the stack and data that is frequently accesses (and requiresmore power) can be stored closer the edges of the stack or near heatsinks. By steering data that requires low latency to memory devices thatprovide low latency and data that has a higher latency requirement tohigh latency memory devices, space can be freed up on the low latencymemory devices for future low latency data storage. In addition, powerconsumption can be optimized by steering data to particular memorydevices based on a predicted power consumption of the data.

The terminology used herein is for the purpose of describing particularembodiments of the invention only and is not intended to be limiting ofthe invention. As used herein, the singular forms “a”, “an” and “the”are intended to include the plural forms as well, unless the contextclearly indicates otherwise. It will be further understood that theterms “comprises” and/or “comprising,” when used in this specification,specify the presence of stated features, integers, steps, operations,elements, and/or components, but do not preclude the presence oraddition of one or more other features, integers, steps, operations,elements, components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of allmeans or step plus function elements in the claims below are intended toinclude any structure, material, or act for performing the function incombination with other claimed elements as specifically claimed. Thedescription of the present invention has been presented for purposes ofillustration and description, but is not intended to be exhaustive orlimited to the invention in the form disclosed. Many modifications andvariations will be apparent to those of ordinary skill in the artwithout departing from the scope and spirit of the invention. Theembodiments of the invention were chosen and described in order to bestexplain the principles of the invention and the practical application,and to enable others of ordinary skill in the art to understand theinvention for various embodiments with various modifications as aresuited to the particular use contemplated.

The present invention may be a system, a method, and/or a computerprogram product. The computer program product may include a computerreadable storage medium (or media) having computer readable programinstructions thereon for causing a processor to carry out aspects of thepresent invention.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Java, Smalltalk, C++ or the like,and conventional procedural programming languages, such as the “C”programming language or similar programming languages. The computerreadable program instructions may execute entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).In some embodiments, electronic circuitry including, for example,programmable logic circuitry, field-programmable gate arrays (FPGA), orprogrammable logic arrays (PLA) may execute the computer readableprogram instructions by utilizing state information of the computerreadable program instructions to personalize the electronic circuitry,in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the block may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

The descriptions of the various embodiments of the present inventionhave been presented for purposes of illustration, but are not intendedto be exhaustive or limited to the embodiments disclosed. Manymodifications and variations will be apparent to those of ordinary skillin the art without departing from the scope and spirit of the describedembodiments. The terminology used herein was chosen to best explain theprinciples of the embodiments, the practical application or technicalimprovement over technologies found in the marketplace, or to enableothers of ordinary skill in the art to understand the embodimentsdisclosed herein.

1. A method comprising: receiving a request to write data to a memory,the memory comprising a stack of memory devices, each of the memorydevices communicatively coupled to at least one other of the memorydevices in the stack via a through silicon via (TSV), the request towrite data received by a hypervisor from an application executing on avirtual machine managed by the hypervisor; and in response to receivingthe request: determining a latency requirement of accesses to the writedata; assigning a physical location on a memory device in the stack ofmemory devices to the write data, the assigning based at least in parton the latency requirement, a predicted power consumption of the writedata, and a position of the memory device in the stack of memorydevices; and sending a write command to a memory controller, the writecommand including the physical location and the write data.
 2. Themethod of claim 1, wherein the request includes a tag that specifies thelatency requirement.
 3. The method of claim 1, further comprisingdetermining a predicted access frequency of the write data, wherein theassigning is further based at least in part on the predicted accessfrequency of the write data.
 4. The method of claim 1, furthercomprising determining a predicted length of time that the write datawill be stored in the memory, wherein the assigning is further based atleast in part on the predicted length of time that the write data willbe stored in the memory.
 5. The method of claim 1, wherein the memorydevices are homogenous.
 6. The method of claim 1, wherein the memorydevices are heterogeneous.
 7. The method of claim 6, wherein theassigning is further based on a type of the memory device.
 8. The methodof claim 6, wherein the memory devices are placed in the stack in anorder that is based on a thermal characteristic of at least one of thememory devices in the stack.
 9. The method of claim 6, wherein thememory devices are placed in the stack in an order that is based on anaccess speed of at least one of the memory devices in the stack.
 10. Asystem comprising: a first memory having computer readable instructions;and one or more processors for executing the computer readableinstructions, the computer readable instructions controlling the one ormore processors to perform operations comprising: receiving a request towrite data to a second memory, the second memory comprising a stack ofmemory devices, each of the memory devices communicatively coupled to atleast one other of the memory devices in the stack via a through siliconvia (TSV), the request to write data received by a hypervisor from anapplication executing on a virtual machine managed by the hypervisor;and in response to receiving the request: determining a latencyrequirement of accesses to the write data; assigning a physical locationon a memory device in the stack of memory devices to the write data, theassigning based at least in part on the latency requirement, a predictedpower consumption of the write data, and a position of the memory devicein the stack of memory devices; and sending a write command to a memorycontroller, the write command including the physical location and thewrite data.
 11. The system of claim 10, wherein the request includes atag that specifies the latency requirement.
 12. The system of claim 10,wherein the operations further comprise determining a predicted accessfrequency of the write data, wherein the assigning is further based atleast in part on the predicted access frequency of the write data. 13.The system of claim 10, wherein the operations further comprisedetermining a predicted length of time that the write data will bestored in the second memory, wherein the assigning is further based atleast in part on the predicted length of time that the write data willbe stored in the second memory.
 14. The system of claim 10, wherein thememory devices are homogenous.
 15. The system of claim 10, wherein thememory devices are heterogeneous.
 16. The system of claim 15, whereinthe assigning is further based on a type of the memory device.
 17. Thesystem of claim 15, wherein the memory devices are placed in the stackin an order that is based on a thermal characteristic of at least one ofthe memory devices in the stack.
 18. The system of claim 15, wherein thememory devices are placed in the stack in an order that is based on anaccess speed of at least one of the memory devices in the stack.
 19. Acomputer program product comprising a computer readable storage mediumhaving program instructions embodied therewith, the program instructionsexecutable by a processor to cause the processor to perform operationscomprising: receiving a request to write data to a memory, the memorycomprising a stack of memory devices, each of the memory devicescommunicatively coupled to at least one other of the memory devices inthe stack via a through silicon via (TSV), the request to write datareceived by a hypervisor from an application executing on a virtualmachine managed by the hypervisor; and in response to receiving therequest: determining a latency requirement of accesses to the writedata; assigning a physical location on a memory device in the stack ofmemory devices to the write data, the assigning based at least in parton the latency requirement, a predicted power consumption of the writedata, and a position of the memory device in the stack of memorydevices; and sending a write command to a memory controller, the writecommand including the physical location and the write data.
 20. Thecomputer program product of claim 19, wherein the request includes a tagthat specifies the latency requirement.